Linear Regression
22 January, 2024
You already…
Please install and load the following packages
Access lecture slide from the course landing page
I am Ayush.
I am a researcher working at the intersection of data, law, development and economics.
I teach Data Science using R at Gokhale Institute of Politics and Economics
I am a RStudio (Posit) certified tidyverse Instructor.
I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.
Reach me
ayush.ap58@gmail.com
ayush.patel@gipe.ac.in
Ok, but what do these measures imply?
possum dataset from the openintro packagehusbands_wives data from openintro
Call:
lm(formula = head_l ~ total_l, data = possum)
Coefficients:
(Intercept) total_l
42.7098 0.5729
Source : Introduction to Modern Statistics
A general equation for linear model can be written as \[ Y \approx \beta_0 + \beta_1X \]
\[\beta_0\hspace{1mm} is\hspace{1mm}population\hspace{1mm}intercept\]
\[\beta_1\hspace{1mm} is\hspace{1mm}population\hspace{1mm}slope\] Our estimates are represented as :
\[ \hat\beta_0\] \[\hat\beta_1\]
starbucks dataset, find the regression equation for relationship between the ages of husbands and wivesteacher dataset, find the relationship between the amount paid to fica and the base salary of teacherselmhurst data from openintro packageSource : Introduction to Modern Statistics
\[\hat y_i = \hat\beta_0 + \hat\beta_1x_i\] \[e_i = y_i - \hat y_i\]
\[RSS = e_1^2 + e_2^2....+e_n^2\]
Least square coefficient estimates
\[ \hat\beta_1 = \frac{\sum_i^n(x_i - \bar x)(y_i - \bar y)}{\sum_i^n(x_i - \bar x)^2} \]
\[ \hat\beta_0 = \bar y - \hat\beta_1\bar x \]
lm() function is used to fit linear models in R$1 of family income, the gift aid given to the student reduces by $0.0431$1000 of family income, gift aid reduces by $43.1teacher dataset, try to interpret the results from the intercept and the coefficientbdims dataset, find the regression equation for how the weight is related to height and interpret the resultsmariokart data in openintrocond variable specifies whether the device is old or new
Call:
lm(formula = total_pr ~ cond, data = mariokart)
Coefficients:
(Intercept) condused
53.77 -10.90
teacher dataset, see how the degree that the teacher has affects their base salariescensus data, check how sex of the individual affects their personal incomeloans_full_schema dataset from openintro# A tibble: 10,000 × 55
emp_title emp_length state homeownership annual_income verified_income
<chr> <dbl> <fct> <fct> <dbl> <fct>
1 "global config … 3 NJ MORTGAGE 90000 Verified
2 "warehouse offi… 10 HI RENT 40000 Not Verified
3 "assembly" 3 WI RENT 40000 Source Verified
4 "customer servi… 1 PA RENT 30000 Not Verified
5 "security super… 10 CA RENT 35000 Verified
6 "" NA KY OWN 34000 Not Verified
7 "hr " 10 MI MORTGAGE 35000 Source Verified
8 "police" 10 AZ MORTGAGE 110000 Source Verified
9 "parts" 10 NV MORTGAGE 65000 Source Verified
10 "4th person" 3 IL RENT 30000 Not Verified
# ℹ 9,990 more rows
# ℹ 49 more variables: debt_to_income <dbl>, annual_income_joint <dbl>,
# verification_income_joint <fct>, debt_to_income_joint <dbl>,
# delinq_2y <int>, months_since_last_delinq <int>,
# earliest_credit_line <dbl>, inquiries_last_12m <int>,
# total_credit_lines <int>, open_credit_lines <int>,
# total_credit_limit <int>, total_credit_utilized <int>, …
Call:
lm(formula = interest_rate ~ public_record_bankrupt, data = loans)
Coefficients:
(Intercept) public_record_bankrupt
12.3403 0.7042
Call:
lm(formula = interest_rate ~ verified_income, data = loans)
Coefficients:
(Intercept) verified_incomeSource Verified
11.099 1.416
verified_incomeVerified
3.254
verified_incomeReference levelrepresents the default level that other levels are measured againstlm(interest_rate ~ verified_income + debt_to_income + public_record_bankrupt +term + credit_util + issue_month, data = loans)
Call:
lm(formula = interest_rate ~ verified_income + debt_to_income +
public_record_bankrupt + term + credit_util + issue_month,
data = loans)
Coefficients:
(Intercept) verified_incomeSource Verified
2.23430 1.09980
verified_incomeVerified debt_to_income
2.66796 0.02276
public_record_bankrupt term
0.48942 0.15417
credit_util issue_monthJan-2018
4.83832 0.04826
issue_monthMar-2018
-0.04700
penguins dataset from the palmerpenguins package, find and interpret the regression equation for the body mass of the penguins and their species